506 research outputs found

    Active Learning for Dialogue Act Classification

    Get PDF
    Active learning techniques were employed for classification of dialogue acts over two dialogue corpora, the English human-human Switchboard corpus and the Spanish human-machine Dihana corpus. It is shown clearly that active learning improves on a baseline obtained through a passive learning approach to tagging the same data sets. An error reduction of 7% was obtained on Switchboard, while a factor 5 reduction in the amount of labeled data needed for classification was achieved on Dihana. The passive Support Vector Machine learner used as baseline in itself significantly improves the state of the art in dialogue act classification on both corpora. On Switchboard it gives a 31% error reduction compared to the previously best reported result

    Methods for Amharic part-of-speech tagging

    Get PDF
    The paper describes a set of experiments involving the application of three state-of- the-art part-of-speech taggers to Ethiopian Amharic, using three different tagsets. The taggers showed worse performance than previously reported results for Eng- lish, in particular having problems with unknown words. The best results were obtained using a Maximum Entropy ap- proach, while HMM-based and SVM- based taggers got comparable results

    Evaluation of Combining Several Statistical Methods with a Flexible Cutoff for Identifying Differentially Expressed Genes in Pairwise Comparison of EST Sets

    Get PDF
    The detection of differentially expressed genes from EST data is of importance for the discovery of potential biological or pharmaceutical targets, especially when studying biological processes in less characterized organisms and where large-scale microarrays are not an option. We present a comparison of five different statistical methods for identifying up-regulated genes through pairwise comparison of EST sets, where one of the sets is generated from a treatment and the other one serves as a control. In addition, we specifically address situations where the sets are relatively small (~2,000–10,000 ESTs) and may differ in size. The methods were tested on both simulated and experimentally derived data, and compared to a collection of cold stress induced genes identified by microarrays. We found that combining the method proposed by Audic and Claverie with Fisher’s exact test and a method based on calculating the difference in relative frequency was the best combination for maximizing the detection of up-regulated genes. We also introduced the use of a flexible cutoff, which takes the size of the EST sets into consideration. This could be considered as an alternative to a static cutoff. Finally, the detected genes showed a low overlap with those identified by microarrays, which indicates, as in previous studies, low overall concordance between the two platforms

    Putative cold acclimation pathways in Arabidopsis thaliana identified by a combined analysis of mRNA co-expression patterns, promoter motifs and transcription factors

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>With the advent of microarray technology, it has become feasible to identify virtually all genes in an organism that are induced by developmental or environmental changes. However, relying solely on gene expression data may be of limited value if the aim is to infer the underlying genetic networks. Development of computational methods to combine microarray data with other information sources is therefore necessary. Here we describe one such method.</p> <p>Results</p> <p>By means of our method, previously published Arabidopsis microarray data from cold acclimated plants at six different time points, promoter motif sequence data extracted from ~24,000 Arabidopsis promoters and known transcription factor binding sites were combined to construct a putative genetic regulatory interaction network. The inferred network includes both previously characterised and hitherto un-described regulatory interactions between transcription factor (TF) genes and genes that encode other TFs or other proteins. Part of the obtained transcription factor regulatory network is presented here. More detailed information is available in the additional files.</p> <p>Conclusion</p> <p>The rule-based method described here can be used to infer genetic networks by combining data from microarrays, promoter sequences and known promoter binding sites. This method should in principle be applicable to any biological system. We tested the method on the cold acclimation process in Arabidopsis and could identify a more complex putative genetic regulatory network than previously described. However, it should be noted that information on specific binding sites for individual TFs were in most cases not available. Thus, gene targets for the entire TF gene families were predicted. In addition, the networks were built solely by a bioinformatics approach and experimental verifications will be necessary for their final validation. On the other hand, since our method highlights putative novel interactions, more directed experiments could now be performed.</p

    Network Properties for Ranking Predicted miRNA Targets in Breast Cancer

    Get PDF
    MicroRNAs control the expression of their target genes by translational repression and transcriptional cleavage. They are involved in various biological processes including development and progression of cancer. To uncover the biological role of miRNAs it is important to identify their target genes. The small number of experimentally validated target genes makes computer prediction methods very important. However, state-of-the-art prediction tools result in a great number of putative targets with an unpredictable number of false positives. In this paper, we propose and evaluate two approaches for ranking the biological relevance of putative targets of miRNAs which are associated with breast cancer

    Generation and analysis of 9792 EST sequences from cold acclimated oat, Avena sativa

    Get PDF
    BACKGROUND: Oat is an important crop in North America and northern Europe. In Scandinavia, yields are limited by the fact that oat cannot be used as a winter crop. In order to develop such a crop, more knowledge about mechanisms of cold tolerance in oat is required. RESULTS: From an oat cDNA library 9792 single-pass EST sequences were obtained. The library was prepared from pooled RNA samples isolated from leaves of four-week old Avena sativa (oat) plants incubated at +4°C for 4, 8, 16 and 32 hours. Exclusion of sequences shorter than 100 bp resulted in 8508 high-quality ESTs with a mean length of 710.7 bp. Clustering and assembly identified a set of 2800 different transcripts denoted the Avena sativa cold induced UniGene set (AsCIUniGene set). Taking advantage of various tools and databases, putative functions were assigned to 1620 (58%) of these genes. Of the remaining 1180 unclassified sequences, 427 appeared to be oat-specific since they lacked any significant sequence similarity (Blast E values > 10(-10)) to any sequence available in the public databases. Of the 2800 UniGene sequences, 398 displayed significant homology (BlastX E values ≤ 10(-10)) to genes previously reported to be involved in cold stress related processes. 107 novel oat transcription factors were also identified, out of which 51 were similar to genes previously shown to be cold induced. The CBF transcription factors have a major role in regulating cold acclimation. Four oat CBF sequences were found, belonging to the monocot cluster of DREB family ERF/AP2 domain proteins. Finally in the total EST sequence data (5.3 Mbp) approximately 400 potential SSRs were found, a frequency similar to what has previously been identified in Arabidopsis ESTs. CONCLUSION: The AsCIUniGene set will now be used to fabricate an oat biochip, to perform various expression studies with different oat cultivars incubated at varying temperatures, to generate molecular markers and provide tools for various genetic transformation experiments in oat. This will lead to a better understanding of the cellular biology of this important crop and will open up new ways to improve its agronomical properties
    corecore